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(57) Abstract 

The shape of energy changes of ao auditory signal is used for idendfying or representing feaoires whic ; tjan be perceived by a human 
ear as represendng a distinct sound picture. In order to extract information horn the shape of the eneigy cLuiges, the sha^ b preferably 
being represented by the shape of a transient pulse of the signal. It is prcfeired that an envelope detection is being used in order to obtain 
the transient signal pulse. The energy change represeotiDg the distinct sound picture can be a pboneme or vcweL The invendon also relates 
to a method for identifying the eneigy changes in the auditory signal by comparing the shape of eneigy changes of the signal, which can be 
represented as the shape of the transient pulse, with predetennined eneigy change shapes representing distinct sound pictures. The inventioo 
also relates lo a method of speech ayntfaesis wherein a series of transient pulses is generated conesponding to the scries of pboncoies to be 
synitfaesized. The invendon lurdicr relates to a system for pcocessing an auditory signal in order to reduce the bandwidth of the signal with 
substantial retention of the infonnatioa of the signal, the system comprising means for extracting the iransis;!! oan|»nent of the auditory 
signal, and means for detecting an envelope of the transient compooent Such a system may be used as a pre^pocess system in an electronic 
system for ^seech or sound analysis. The methods and systems of the invention may be used within the fields of speech recognitioQ. ^wecfa 
synthesizing, naaow band telrcnmmuniration, hearing aids, and quality naeasurement of audio products. 
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METHOD AND SYSTEM FOR DETECTING AND GENERATING TRANSIENT 
CONDITIONS IN AUDITORY SIGNALS 



The present invention relates to a method and system for 
signal processing, by which method and system features 
5 representing distinct sound pictures in auditory signals are 
extracted from trsmsients in the auditory signals. The result 
of the processing may be used for identification of sound or 
speech signals or for quality measurement of audio products 
or systems, such as loudspeakers, hearing aids, 
10 telecommunication systems, or for quality measurement of 

acoustic conditions. The method of the present invention may 
also be used in connection with speech compres.? ..on and 
decompression in narrow band telecommunication 

In the prior art methods of signal analysing of auditory 
15 signaJ.8, the signals are considered to be steady state over a 
short time of period, and a form of short time ^rpectral 
analysis is used under this assumption. 

The human ear has the ability to simultaneously catch fast 
soxmd signals, detect sound frequencies with great accuracy 
20 and differentiate between sound signals in coic^licated sound 
environments. For instance it is possible to understand what 
a singer is singing in an accompaniment of musical instru- 
ments. 

In prior art methods of signal analysis and in the method of 
25 the present invention it is assumed that the co-zhlea in the 
human ear can be regarded as an infinite niunber of bcuadpass 
filters, IBP, within the frequency range of the hurooun ear. 

The time response f (t) for one bandpass filter due to cui 
excitation can be separated into two components, the 
w 30 transient response, ft(t), and the steady state response, 

fs(t). 



(1) 



f (t)=ft(t)+fs(t) . 
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Traditional signal processing is based on the steady state 
response fs(t), and tlie transient response ft(tj is assumed 
to vanish very fast and to be without importance for the 
perception, see for example "Principles of Circuit 
5 Synthesis", McGraw-Hill 1959, Ernest 5. Kuh and Donald 0. 
Pederson, page 12, lines 9-15, where it is stated that: 

"only the forced response is considered while the response 
due to the initial state of the network is ignored" • 

Thus, when students are introduced to the world of signal 
10 cinalysis, they learn at a very early stage that the transient 
response, i.e. the response due to the initial e-^cate of the 
network should be ignored because it vanishes vithin a very 
short period of time. Furthermore, it is rather difficult to 
analyse these transient signals by use of traditional linear 
15 methods of oinalysis. 

The ability of the human ear to hear very short soxxnds and at 
the same time detect frequencies with great accuracy is in 
conflict with the traditional filterbased spectrum analysis. 
The time window (twice the rise time) of a bandpass filter is 
20 inversely proportional to the bandwidth, 

(2) tw=2/(fu-fl) 

where fl is the lower cutoff frequency and fu is the upper 
cutoff frequency. 

Thus, if a rise time of 5 ros is required the cc^isequence is 
25 that the frequency resolution is no better tha.: 400 Hz. 

As the detection of these transients is in conflict with a 
high frecjuency resolution, the detecting by the human ear of 
these transients must take place in an alternative manner. It 
has not been examined how the hxamaji ear is able to detect 
30 these signals, but it might be possible that the cochlea, 

when no sounds are received, is in a position of rest, where 
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the cochlea will be very broad -banded. When a sound signal is 
received, the cochlea may start to lock itself to the 
frecjuency component or components within the signal. Thus, 
the cochlea may be broad-banded in its starting position, but 
5 if one or more stable frequencies are received the cochlea 
may lock itself to this frequency or these frequencies with a 
high accuracy. 

Today it is known that the nerve pulses launched from the 
cochlea are synchronized to the frequency of a cone if the 
10 frequency is less than about 1.4 IcHz. If the frequency is 

higher than 1.4 kHz the pulses are launched randomly and less 
than once per cycle of the frequency. 

Signal analysis based on filter bamk spectrum a-ialysis is 
disclosed in GB 2213623 which describes a systeia for phoneme 

15 recognition. This system comprises detecting means for 
detecting transient parts of a voice signal, whure the 
principal object of the transient detection is ^he detection 
of a point where the speech spectrum varies most sharply, 
namely, a peak point. The detection of the peak points is 

20 used for a more precise phoneme segmentation. The trsuisient 
cuialysis of 6B 2213623 is based on a spectrum analysis and 
the change in the spectrum, which is very much different to 
the transient analysis of the present invention which is 
based on a direct transient detection in the time domain. 

25 The present invention is based on an approach which is 

different in principle from all known methods f cr analysing 
auditory signals. According to the invention it has been 
found that the signal information relevant to tX'.e 
identification of the auditory signal is present in the 

30 transient component of the signal. Thus, the mechod of the 
present invention involves a separation of the transient 
con^onent or response of the auditory signal, ^ generation of 
a treuisient pulse corresponding to the transient component, 
ouid analysis of the shape of the pulse. In an auditory 

35 signal, the corresponding transient pulse may be repeated 
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with time inteirvals, ajid the time interval of these periodic 
transient pulses is normally also analysed or determined. 

In real life the human ear reacts to energy changes at high 
frequencies in order to recognize phonemes or sound pictures, 
5 But in the present method transient pulses corresponding to 
the energy changes observed by the ear are extracted at these 
high frequencies, whereafter the transient pulses preferably 
are transformed to the low frequency range still maintaining 
the distinct features of the sound pictures or phonemes. 
10 Thus, by using the principles of the invention, it is 

possible to obtain distinct features within auditory signals 
by examining the transformed low frequency sigrals. 

As will be \inderstood from the following eaqjlanation of the 
method of the invention, the concept of extracting transient 
15 waveforms or shape of pulses makes it possible to use pre- 
process methods which are much simpler than the best designs 
presently used and at the same time obtain much more valuable 
information with respect to tbe-auditory input signals. 

In its broadest aspect, the invention relates to the use of 
20 the shape of energy changes of an auditory signal for 

identifying or representing features which can be perceived 
by an animal ear such as a human ear as representing a 
distinct sound picture. 

Before entering into a more detailed explanation of features 
25 of the method of the invention, a few definitions will be 
given : 

In short time axialysis the transient component in a signal is 
a matter of definition. The idea is to obtain an expression 
that gives a response corresponding to the response in the 
30 cochlea to an abrupt chsmge in the signal energy. An abrupt 
change in the signal energy corresponds to the transient 
coit5)onent in the auditory signal. Thus, in the present 
context, the term "transient component" designates any signal 
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corresponding to an abrupt energy change in an auditory 
signal. The transient conponent holds the signa^ information 
to be analysed and in order to analyse this information the 
transient conqDonent may be transformed to a corresponding 
5 transient pulse having a distinct shape. Thus, i-n the present 
context, the term "transient pulse" refers to a pulse having 
a distinct shape and substantially holding the information of 
the transient component of the auditory signal *ind thus 
corresponding to an abrupt change in the energy of the 

10 auditory signal. As mentioned above the transient part of a 
sound signal may be repeated with time intervals and thus, in 
the present context, the term Aperiodic" when used in 
combination with a transient component, response or pulse 
designates any transient component, response or pulse being 

15 repeated with intervals . 

The term " shape designates any arbitrary time varying 
function (which is time- limited or not time- limited) and 
which, within a given time interval Tp has a distinctly 
different ainplitude level in comparison with the amplitude 

20 level outside the interval. Thus, Tp is the duration of the 
shape ftinction when the shape function is time -limited, or 
the duration of the part of the function which has a 
distinctly different amplitude level in coii5>arison with the 
amplitude level outside the time interval. As will be 

25 understood, the identification of the shape ot a pulse is 
suitably performed by observing the aicqplitude of the pulse 
along the time axis of the pulse. 

In order to extract information from the shape of the energy 
changes, one broad aspect of the invention relates to 

30 represent the shape of the energy changes by the shape of a 
transient pulse of the signal. However, several methods can 
be applied in order to obtain a trcinsient pulse corresponding 
to the change in energy, but is is preferred that an envelope 
detection is being used, where the envelope preferably should 

35 be detected from a transient response of the energy chamge in 
the auditory signal. 
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The energy change representing the distinct sound picture can 
be a phoneme or vowel or any other sound which gives a sudden 
energy change in an auditory signal. 

It is also an aspect of the invention to provide a method for 
5 identifying, in an auditory signal, energy changes which can 
be perceived by an animal ear such as a human ear as 
representing a distinct sound picture, the method comprising 
comparing the shape of energy changes of the signal with 
predetermined energy change shapes representing distinct 

10 sound pictures. For the identification it is preferred that 
the shape of the energy changes are represented by the shape 
of a transient pulse of the signal, and it is f\irthermore 
preferred that the shape of the transient pulse should be 
obtained by an envelope detection of a transient; response of 

15 the energy change in the auditory signal . 

The invention also relates to a method for processing an 
auditory signal so as to reduce the bandwith of the signal 
with sxabstantial retention of the information the signal, 
comprising extracting the transient component of the auditory 
20 signal, and detecting an envelope of the transient component. 
It is preferred that transient pulse shapes of the signal 
which can be perceived by an animal ear such, as a human ear 
as representing a distinct sound picture are identified. 

It should be noted that the pulse rise time or the form of 
25 the leading edge, the duration of the pulse, and the fall 
time or the form of the lagging edge are all dnjortant 
features for identification of the pulse. In a. preferred 
embodiment of the invention the shape of the .leading edge of 
a pulse is identified, and it is also preferred that the 
30 shape of the leading edge is determined by determining rise 
time, slope and/or slope variation of at least part of the 
leading edge. 

In a preferred embodiment of the invention, the rise time, 
slope and/or slope variation of at least the top part of the 
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leading edge is determined, since the upper pare of the pulse 
should contain the necessary information. The top part may be 
defined as the part beginning substantially at a point where 
the slope is maximum. The top part may also be the part 
corresponding to the upper 50% of the Eunplitude of the pulse. 

When determining the shape of the pulse several metods may be 
used, but in a preferred embodiment the rise time, slope 
and/or slope variation of the leading edge is determined on 
the basis of at least 5 samples. However any other suitoQsle 
number of samples may be used. Another preferred method of 
identification of the shape of the leading edge may be 
performed using comparison with a library of rt Terences. 
Here, the references with which comparison is i\ jade could be 
selected on the basis of the rise time of the Jeading edge. 

It is also preferred to perform an identification of the 
duration of the pulse, where the duration of a.T3ulse can be 
determined as the distance from the leading edge to the 
lagging edge at a predetermined amplitude. 

Afi should be understood, it is also preferred to identify the 
shape of the lagging edge of the transient pulee. 

The method of the present invention provides an expression 
for the transient conditions of the auditory signal. The 
method comprises a bandpass filtration of an auditory signal 
within the frequency range of the hximsui ear and a detection 
of a lowpass filtered envelope, which envelope :rhen can be 
analysed with known methods of signal analysis. The envelope 
is an expression of the transient part of the signal. 

The known method of signal analysis, which should be used 
when analysing the envelope, and the characteristics of the 
bandpass filter, which should be selected, will depend on the 
pujrpose of the analysis. The purpose may be speech 
recognition, gualitiy- measurement of audio products or 
acoustic conditions, and narrow band telecommunication. 
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Tlie invention also relates to a system for processing an 
auditory signal to reduce the bandwith of the signal with 
substajitial retention of the infoannation of the signal, 
comprising means for extracting the transient conqponent of 
the auditory signal, and means for detecting an envelope of 
the transient component. 

Embodiments and details of the system appear from the claims 
and the detailed discussion of embodiments of the system 
given in connection with the figures and a mathematical 
description of an embodiment of the system. 

The invention will now be described in further detail in 
connection with a mathematical description of tLe principle 
of the invention and in connection with the drcwing. 

Fig. 1 shows the spectre of a bandpass filter Toa) and a 
lowpass filter H((i>), 

Fig. 2 shows the zeros and the poles in the s -plane for an 
infinite niimber of bandpass filters, IBP, having identical 
bandwidth. 

Fig. 3 shows the zeros and poles in the s- plane for an 
infinite number of bandpass filters, IBP, having identical Q, 

Fig. 4 illustrates the impulse response for various root 
locations in the s- plane. 

Fig. 5 shows a spectrogram for the words "linei&.r prediction". 

Fig. 6 illustrates how a summation of an infinite number of 
bandpass filters, IBP, can be performed by one bandpass 
filtration. 

Fig. 7 illustrates the principle of a transient detection 
system according to the invention. 
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Fig. 8 shows a block diagram for a transient detection system 
according to the invention. 

Pig. 9 shows the characteristics of a preferred highpass 
filter to be used in the system of Fig. 8, 

Fig. 10 shows the characteristics of a preferred lowpass 
filter to be used in the system of Fig. 8, 

Fig. 11 illustrates the sensitivity of the human ear, 

Fig. 12 illustrates average formant freguenciee for the 
American vowels /i(:)/, /«(:)/, /a(:)/, and /u..)/. 

Fig. 13 shows the experimental results of the first transient 
analysis of the vowels of Fig, 11, 

Fig. 14 shows processed curves of the vowel "i'* as in "heat". 

Fig. 15 shows sixailar curves as in Fig. 12 for the vowel "o" 
as in "hop". 

Fig. 16 shows normalized time windows for the processed 
curves of the vowel "i" as in "heat". 

Pig. 17 shows normalized time windows for the vowel "o" as in 
"hop" , 

Fig. 18 shows normalized time windows for the vowel "a" as in 
"have", 

Pig. 19 shows a block diagram for a speech recognition system 
according to the invention, and 

Figs. 20-25 show transient pulses for speech synthesis of the 
phonemes "i" as in "heat", "o" as in "hop", "o" as in 
"ongaonga", "u" as in the Dsuiish word "hus", "0" as in the 
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Danish word "ase", and "y" as in the Danish word "lys^*, 
respectively. 

First, a mathematical explcoiation of the principles of the 
invention is given. 

5 A bandpass filter may be represented in the time domain by an 
impulse response and Ccui be expressed as 

(3) f (t)=h(t)cos(o)ct) 

where h(t) is the impulse response for a lowpass filter and 
is the centre frequency of the bandpass filter f (t) . The 
10 term cosio)^^) may be regarded as representing a frequency 
shift of the lowpass filter to a bandpass filter with a 
centre frequency at oj^,. This is illustrated in jpig. 1, where 
F(<d) and H(w) are the corresponding frequency characteristics 
of f (t) and h(t) . 

15 Let the IBP filters be composed of a simple bandpass filter, 
BP, with a zero at origin auad two complex poles 
(complementary) in the left half plan of the coxnplex s-plsme 
and let the poles of the IBP filters be placed in a straight 
line then: 

20 1) If the bandwidth is identical for all the IPS filters then 
the rise time and the delay time will be identical, for all 
filters but 0=fc/(fu-fl) will be inversely proportional to 
the centre frequency fc. The zeros and the polen are shown in 
Fig . 2 • 

25 2) If Q is identical for ail filters then the rise time and 
the delay time will be inversely proportional to the centre 
frequency while the bajidwidth will be proportional to the 
centre frequency. The zeros and the poles are shown in Fig. 
3. 



Printed from Mimosa 00/11/24 13:36:10 Page: 12 



wo 94/25958 



PCT/DK94/00164 



11 



10 



15 



20 



It is assumed that the rise time and the delay time are 
identical for the IBP filters within the frequency range 
which is of interest for the analysis of the transient 
conditions. If this is not the case it is assvuned that the 
brain will compensate for it. The effect is only that the 
rise time will be slower and the delay time will be longer 
with falling frequencies' (if Q is identical) , The rhythm and 
the shape of the transients will be the same. 

In short time analysis the transient coxnponent in a signal iEf 
a matter of definition. The idea is to get an expression that 
gives a response corresponding to the response in the cochlea 
to an abrupt change in the signal energy. An abrupt change in 
the signal energy corresponds to the transient component in 
the auditory signal. 

The con^osition of the transient and the steady state 
component in a signal may be identified by envelope 
detection, where the steady state component is the DC 
qomponent in the detected envelope and the transient 
component is identified as the changes in the level of the 
envelope . 

The transient response may be identified by envelope 
detection. 

The envelope of the Impulse response Ccui be expressed as 



(4) 




where f(t) Is the Hilbert transform of f(t). 



By substituting (3) Into (4) we have 



(5) 




For the Hilbert trainsf orm we have 



Printed from Mimosa 00/11/24 13:36:11 Page: 13 



wo 94/25958 



ltrr/DK94/00164 



u(t) vTt> =u(t) v(t:)=u(t) v(t) 



if the spectra for u{t) and v(t) do not overlap. 



Hence we have 



(7) 



ft(t) = [Ih(t)cos(<-)cBt)]2 + [h(t)sin(wct)]^ } 



5 



and 



(B) 



ft(t) = |h(t) I 



based on the assumption that the spectrum for k(t) does hot 
overlap the centre frequency co^. Under this condition the 
envelope of the impulse response is independent of the centre 
10 frequency. This is illustrated in Fig. 4 which shows how 

different impulse responses will result in the same envelope. 

The result of (8) causes the total envelope for the IBP 
filters to be the sum of the envelopes for the individual 
beuidpass filters. 

15 An accumulated transient response ftt(t) can thus be 

expressed by sinnming ft (t) . This summation can be expressed 



as 



(9) 




and 



(10) 



ftt(t) = |h(t) I («cu • "cl)' 



20 where is the centre frequency for the lower IBP filter 
and is the centre frequency for the upper IBP filter. 
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Fig. 5 shows a speccrograin for the words "linear prediction" 
when pronoxinced by a man. The spectrogram is recorded with 
bandpass filters with a bandwidth of 300 Hz and centre 
frecjuencies in the range from cdDOut 150 Hz up no about 4 kHz. 
5 The ordinate is the frequency, the abscissa is the time and 
the black ink is a degree of the signal energy. The 
horizontal oriented black bands are dominating frequency 
bands in the speech and are called f ormants . Th -s vertical 
thin lines correspond to abrupt energy changes and thus to 

10 the transient components of the signal. A spectrogram is 

usually used for formant analysis and a bandwidth of 300 Hz 
is not sufficient for transient analysis, but the appearance 
of the shape of the lines confirm that the transient signal 
is independent of the centre f recpiency of the bandpass 

15 filters. 

As mentioned above the cochlea may be regarded as having an 
infinite niimber of beuidpass filters, but it wot Id be 
advantageous to be able to detect the transient signal 
without the use of a large number of bandpass filters. 

20 Fig. 6 illustrates how a summation of an infinite nximber of 
bandpass filters, IBP, can be performed by one bandpciss 
filtration, BP, having a bandwidth that covers che cutoff 
frequencies of the lower and the upper IBP filter, IBP^ and 
IBP^. Preferably, the bandpass filter BP should be of the 

25 maximum flat delay type, as this type of filter is well 
suited for preserving the shape of a transient . condition. 

In practice the simplest way to detect the envelope is to use 
a rectifier and a lowpass filter, see for exam{:le 
^Communication Systems. An introduction to Signal ajid Noise 

30 in Electrical Corrmunication" , McGraw-Hill Kogali.usha 1968, A. 
Bruce Carlson. From equation (10) it can be seen that the 
accximulated transient component may be detected by performing 
a highpass filtration, BP, covering the range of IBP that 
needs to be accumulated before the envelope detection. An 

35 envelope detection corresponds to a frequency shift by the 
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centre frequency of the beincipass filter to a lowpass 
filter with half the bandwidth of the bandpass filter. This 
means that the cutoff frequency of the lowpass filter 
determines the bandwidth of all the IBP covered by the BP. 
5 This principle is illustrated in Fig. 7. 

In Fig. 7 the digitalized sound signal S(t) enters a bandpass 
or highpass filter BP, 10, the output of the bandpass filter 
is input into a rectifying unit 11, the output of which is 
input into a lowpass filter LP, 12. The output of the lowpass 
10 filter 12 is designated ftt(t) and represents c- detection of 
the envelope and thus a detection of the transient response 
of the sound signal S(t). 

Prom the mathematical definition of a transient, part of a 
signal it can be concluded that the poles of h(t) will be 
15 located on the negative reel axis in the s-pleuie. This means 
that the inpulse response will not be oscillate r^g around zero 
(a transient response is a non oscillating signal) . From 
equation (10) it can be seen that the limits ci;^^ and 0)^3^ for 
the IBP filters is only a question of quantity of ftt{t) . 

20 The bandpass filtration, BP, sets the limits for the 

siimmation of the transient responses of the IBP filters, and 
the an^litude characteristic weights the contribution from 
the IBP filters. If a lowpass filter is used instead of BP, 
there will be an overlap of the spectrum for h(t) and the 

25 centre frequency for the lower IPB filter. The bandpass 

filter BP should have a band width which, at legist equals the 
double of the cutoff frequency of the lovipass -filter LP. The 
band width and the aznplitude characteristic can be utilized 
for optimizing different signal analyses when .using the 

30 method according to the invention. 

In principle the poles of the lowpass filter L? should be 
located on the negative reel sixis for a mathematical 
transient detecting system. However, when dealing with 
auditory signals, it is the characteristic of Che cochlea 
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which is decisive; but there should preferably be no 
significant oscillations within the impulse response, as this 
could make the transient conditions of the auditory signal 
more indistinct . 

The cutoff frequency of the lowpass filter LP is an 
expression for the transient conditions of the signal, and 
this frequency should in connection with auditc ry signals 
result in a rise time corresponding to the rise time of the 
cochlea. The cutoff frequency may be regarded as an index of 
transients, where a low cutoff frec[uency will x-esult in 
transient detection of only those signal elements having a 
slow rise time, and where a high cutoff frec[ueB.\:y also will 
result in detection of signal elements having a fast rise 
time. 

The fact that the nerve pulses from the ear are synchronized 
to the frequency below aibout 1.4 kHz and not above indicates 
that the ear is tone oriented below 1.4 kHz and transient 
priented above. In the transient oriented area the nerve 
pulses are synchronized to transients, correspcnding to 
cLbrupt energy changes, in the signal. 

The cutoff frequencies for the BP should correspond to the 
transient sensitive range for the cochlea (theozietically it 
should have an anqplitude characteristic corresponding to the 
sensitive curve for the ear) . The sensitivity carve for the 
human hearing indicates that the lower cutoff frequency must 
be about 2 kHz cuid the upper about 5 kHz. The ^iL^litude 
characteristic for the BP filter will weight ..the 
contributions from the individual IBP filters. 

From the sUoove discussion a transient detection and analysis 
system according to the invention may be constructed as shown 
in the block diagram of Fig, 8. In Fig. 8 a sound signal is 
input into a microphone 13 the output of which is passed 
through a lowpass iTilter 14 before being digitalized by an 
A/D converter 15. The output of the A/D converter S (t) is 
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lead to a highpass or bandpass filter BP, 10, the output of 
the bandpass filter is input into a rectifying unit 11 the 
output of which is input into a lowpass filter LP, 12, see 
also Fig. 7. The output of the lov/pass filter 12 is 
designated ftt(t) and represents the transient components of 
the input signal. In order to analyse the transient 
components, the output signal of the lowpass filter 12 should 
preferably be lead into equipment for signal ai alysis or 
recognition 16. 

Figs. 9 and 10 show the characteristics of a preferred 
highpass filter and lowpass filter to be used In the systems 
of Figs. 7 or 8. The bandpass filter BP to be used as the 
highpass filter 10 in Figs. 7 or 8 should have a lower cutoff 
frecjuency of at least 2000 Hz, preferably abouL 3000 Hz. The 
upper cutoff frequency should be in the ramge between 4500 
and 7000 Hz, prefercd>ly about 6000 Hz. The characteristic 
shown in Fig. 9 has a lower cutoff frequency of. 3014 Hz. The 
lowpass filter LP to be used in Figs. 7 or 8 should have a 
.higher cutoff frequency in the range of 400-1200 Hz, 
preferably about 700 Hz. The characteristic shown in Fig. 10 
has a higher cutoff frequency of 732 Hz. It would also be 
possible to construct a transient detection system according 
to Figs. 7 or 8 by using a full-wave rectifier. However, it 
is preferred to use a one-way rectifier as illustrated in 
Figs. 7 and 8. 

In Fig. 11 the sensitivity of the human ear is illustrated as 
the response of the cochlea on auditory signal s for tones is 
shown. As already mentioned the perception is tone oriented 
up to about 1.4 kHz and transient oriented above 1,4 kHz. 

As mentioned above and illustrated in Fig. 6 the total 
envelope for the IBP filters is obtained by a Bummation of 
the envelopes of the individual bandpass filters, and the 
summation of an infinite or high number of bandpass filters 
IBP can be performed by one bandpass filtration BP. This 
principle is used in the diagram shown in Fig. 7. However, a 
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siimmation of a number of bandpass filters may ai3o be 
realized by using a filter bank method in which the envelopes 
of a number of individual baindpass filters are detected and 
summed. Thus, each branch within the filter bank is composed 
5 of a bandpass filter with a specific centre frequency, a 

rectifying unit and a lowpass filter, and the outputs of the 
lowpass filters are summed in order to obtain the total 
envelope . 

Now, some introductory experiments illustrated by Figs. 12 
10 and 13 will be discussed: 

Two experiments were carried out in order to ev. laate the 
cutoff frequencies for the BP and the LP filter^^. and to 
evaluate the suitability of the method for spern ii 
recognition. 

15 1. E^qperiment by listening to an amplitude modulated signal 

To have a first indication of the cutoff frequency for the LP 
filter under controlled conditions, a listening experiment 
was carried out with an sunplitude modulated signal in the 
sensitive frec[uency range for the ear. The experiment is 
20 somewhat artificial because normally there would not be so 
intensive a signal in that range and it can not be 
recommended to verify the e^qperiment because it is very hard 
to the eax. 

The carrier frequency was chosen to 3.5 kHz anci the 
25 modulation tone was tuned up from a few Hz and upwards. Until 
350 - 400 Hz the envelope signal sounds buzz. Arrer that it 
sounds first like a hollow /u(:)/ and at 600 Hz like a sharp 
/i(:)/. Above 800 Hz it was not possible to hear the envelope 
signal. If the tone is increased further at a given point one 
30 will hear different mdLxed tones. 
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The sound was of course dominated by the carrier frequency 
but it was indicated that the cutoff frequency tor the LP 
filter probably has to be less than 1-1.2 kHz. 



The modulation index was about 0.75. When it is greater than 
5 1, the introduction of overtones can be observred. 

2. Analysis of transient signals for four vowels 

Selection of vowels: 

Fig. 12 shows average formant frequencies for the American 
vowels /i(r)/, /«(:)/, /a{:)/, and /u(:)/ .as ir. heed, had, 
10 hod, and who'd for men, women, and children. These vowels 

represent a good dispersal among vowels so the:* were selected 
to the experiment. 

The vowels were recorded (with Danish accent) p-xonounced of a 
man, a woman, and a child by an ordinary cassette recorder. 

15 Setup for the esqperiment: 

An analog TSD (Transient Signal Detector) was designed in 
accordance with Pig. 7. The design was based oi.. the 
operational amplifier LM 833. 

The specification for the filters were: 

20 The BP filter was a four orders Chebyshev filter with 1 db 
ripple. The upper cutoff frequency is about 6.S kHz and the 
lower is adjustaUble from about 550 Hz to 2.6 -^kii^. 

The rectifier was a full rectifier that conver\::s the negative 
signal and adds it to the positive signal. 

25 The LP filter was a two orders Butterworth filter designed to 
have a cutoff frequency at 1.5 kHz (the 3 db cutoff frequency 
was measured to 2.1 kHz) . 
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Recording vowels and detecting the transient si^xial: 

Four vowels pronounced by a man, a woman, atnd a child were 
recorded on an ordinary radio cassette recorder. The 
transient signal was detected by means of the TSD, converted, 
5 and stored on PC by means of an 8 bits A/D converter. The 
saimpling rate when recording was 10 kHz, but when analysing 
the recorded data only every second set of values was 
considered, resulting in a sampling rate of 5 kHz. An 6 bits 
A/D converter gives a poor dynamic range and therefore it was 
10 necessary to record the vowels isolated (that means not in a 
word) and this gives a more uncertain pronunciation. 

Figs. 13a- 13p show the experimental results of the first 
transient analysis of the vowels of Fig. 12. 

It is possible to identify the vowel by listening to the 
15 transient signal. By visual inspection of time v.^riation of 
the results it cotild be observed that the same- vowel 
E>ronounced by a man, a woman, and a child, respfvctively, was 
having almost the same characteristics, even if differences 
in the fundamental tone were observed. When recording the 
20 vowel /all)/ as in the Danish word "op", a p- sound was also 
recorded which is clearly seen from the time variation of the 
transient signal. 

Analysis of the trcuisient signals: 

The power in the transient signals varies a lot from vowel to 
25 vowel. The signals of the vowels /a(:)/ and /u(:)/ were very 
low (especially for the man's voice) euid it was aecessary to 
turn up the volume for the radio cassette recorder to a high 
level and it caused a lot of noise. 

First, there were made a number of FFT analysis of 20 ms 
30 duration suid a 5 Khz sampling rate at different starting 
points in the vowels. The spectra appear to be 
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outstemding and identical throughout the vowel. This strongly 
indicates that there is important information in the signal. 

In order to analyse common features 20 ms (101 samples) were 
randomly chosen from each vowel. The time signals were 
smoothed by a Hamming window and the FFT's were calculated. 
In Figs. 13a-13d the power spectra are shown where the three 
voices are illustrated in the same diagram fot viach vowel and 
the corresponding transient signals are shown separately in 
Figs. 13e-13h when pronounced by a woman, in Figs. 131-131 
when pronounced by a man and in Figs. 13m- 13p when pronounced 
by a child. 

The spectra are expected to have the following leatures: 

The spectra of the same vowel pronounced by three different 
voices will have some common features related to the vowel 
and some features related to the voice. 

The spectra of different vowels pronounced by the same voice 
will have some features related to the different vowels and 
some common features from the voices. 

Furthermore, it must be expected that the shapf:: of the 
spectra plays a more important part than the ' absolute 
frequencies . 

From the power spectra the following can be seen: 
/i(:)/ (Fig. 13a) 

The most remarkable feature is that the spectra from all 
three have an outstanding top in the frequency range from 
300-400 Hz, they are 50 Hz wide, and there are an outstanding 
cleft at 200-250 Hz. Furthermore, there is a contribution at 
50 Hz. The man's voice has a contribution at 150 Hz which 
must attribute to a deep voice. 
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/aE{:)/ (Fig. 13b) 

The voices of the woman and of the man have an outstanding 
cleft at 350 Hz (deeper than 50 db) . The mans voice has also 
in this case a contribution at 150 Hz. The voice of the child 
does not fit so well into the pattern, this might perhaps be 
due to an uncertain pronunciation. 

/a(:)/ (Pig. 13c) 

All three voices have top 250-300 Hz. The frequency range is 
a bit lower and not so outstanding as for the /i(:)/. 
Further, there is major contribution at 50 Hz a.cd below for 
all three voices . 

/u(:)/ (Fig- 13d) 

The voices of the child and of the woman are ree.l alike and 
they have a peak at 300 and 350 Hz and they have a deep wide 
valley at 100 Hz. The man's voice has also a peak and the 
valley is as wide as it is for the woman and the child but 
not so deep. The reason for this can be the deep voice and 
the fact that there is a lot of noise in the signal caused by 
the radio cassette recorder. 

The experiments leading to the results of Figs. 13a-p can be 
seen as introductory but the results are highly interesting 
especially when taking into consideration the J3:uiiple 
equipment that has been used with a lot of nolHO and only 8 
bit A/D- converter. In spite of this the results are 
outstcuiding . There has been no particular data . oelectlon to 
improve the results and there is therefore no doubt that the 
transient condition is of decisive importauxce for speech 
recognition . 

It seems like all information might be located In the 
frecniency range below 500 Hz. If this Is the case then the 
demand on the sanpling frequency will be less than 1.5 kHz 
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and it will be possible to analyse the speech signal very 
intensively with more parallel processes. It is possible to 
have more time windows for instance 5, 20, and 40 ms cuid use 
spectrum analysis (FFT, LPC, CEPSTRUM, or others) to detect 
some phonemes and time analysis (correlation or methods) to 
detect others phonemes . 

It is most likely that a more sophisticated design of the TSD 
with an AGC amplifier as preamplifier and a logdirithmic or 
AGC amplifier after the BP filter in order to condensate for 
variations in the energy of the bandpass filtered phonemes, 
will allow very good results to be obtained and cause a very 
robust speaJcer independent speech recognition, better results 
may be obtained if a 12 or 16 bit A/D converter is used 
instead of the 8 bit A/D converter. 

Further experimental results illustrated in Figs. 14-18 will 
be discussed in the following: 

The method of extracting transient signal components 
according to the present invention may also be regarded as a 
pre-process of the auditory input signal. In order to be able 
to obtain a better xmderstanding and/or determination of the 
parameters of the pre-process a software programme were 
developed, by use of which it is possible to show the output 
signals and listen to the outcome after each process step of 
the pre-process. 

The analysis of speech signals shown in Figs. 3^ and 15 has 
been made by means of this software programme running on a 
Compaq Deskpro 4/66i PC. This type of PC is prc-vided with 
Microsoft Windows Soxmd System, a microphone and a codec chip 
(AD1848) from Analog Devices. The codec chip performs the 
sampling, the ajiti aliasing filtration and the 7i/D 
conversion. 

The speech signals shown in Figs. 14a and 15a are recorded by 
means of this Sound System. The speech signal is sampled with 
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11025 kHz and 16 bits linear PCM. The passband is greater 
than 4.9 kHz. 

Pretransient signals are shown in Figs. 14b and 15b. These 
signals are the speech signals filtered by a third order IIR 
5 digital highpass filter with a cutoff frequency at 3.0 kHz. 
The filter is a bilinear transformation of a th;.rd order 
Butterworth filter. 

The cutoff frequency at 3.0 kHz has been chosen to get the 
bandpass in the range of the most sensitive area of the 
10 cochlea. In this case it means from 3.0 kHz to 4.9 kHz, where 
4.9 kHz is given by the codec chip. The high- ox bandpass 
filter will be optimal if it has maximum flat d'f^iay 
characteristic in accordance with equation (10) 

The transient signals shown In Figs. 14c and 15:? are the 
15 pretransient signal rectified and filtered by a second order 
IIR digital lowpass filter with a cutoff frequency at about 
700 Hz. The filter is a bilinear transformation of a second 
order Butterworth filter. 

The lowpass filter shall preserve the shape of the transient 
20 pulse corresponding to a trsmsient response In the cochlea, 
so that a filter which is able to do this will be an optimal 
filter. The nerves in the cochlea are able to launch nerve 
pulses with a frequency up to about 1.4 kHz. A bandwidth for 
the IBP filters in the transient oriented area at 1.4 kHz are 
25 transformed by the envelope detection to a cutoff freiquency 
for a lowpass filter at 700 Hz, which is the reixson why a 
cutoff frequency at about 700 Hz has been choseu 

The transient signal may be regarded as an escpression for the 
energy change in the signal. 

30 All the signals presented in Figs. 14 and 15 are normalized 
to a maximum signal level, which means that the largest 
absolute signal value is equal to 32766. The aiDscissas in 
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Figs. 14 and 15 represent a time interval of 50 tns and the 
ordinates in Figs. 14a, ISa and Figs. I4b, 15b represent tbe 
sound pressure of the corresponding speech signal whereas the 
ordinates of Figs. 14c, 15c represent the energ-y of the 
5 corresponding transient speech signal. 

It is possible to listen to the speech, the prer.ransient and 
the transient signals, corresponding to Figs. 14a, 15a, 14b, 
15b and 14c, 15c, respectively. One of the main demands for 
selecting the filter characteristics is that the signals have 
10 to maintain a sound which is close to the original speech 
signal when listening to the above mentioned sicrnals. 

Referring to the system illustrated in Fig. 7, Fig. 14 shows 
curves of the vowel "i" as in "heat", when pronounced by a 
man, where (a) shows the speech signal before filtration 

15 corresponding to the digitalized input signal S in Fig. 7, 
(b) shows the signal after a highpass filtratior 
corresponding to the output signal of the bandpass filter 10 
i,n Fig. 7, and (c) shows the signal after recti.tying and 
lowpass filtering corresponding to the output signal of the 

20 lowpass filter 12 in Fig. 7. 

Fig. 15 shows similar curves as in Pig, 14 for the vowel "o" 
as in "hop". 

The rise and fall time and the width or duration of the 
transient pulse is observed to be of importance for the sound 

25 in a vowel. Figs. 16-18 give exanples of measured transient 
pulses. The tdLme window of the vowel "i" as in "heat", when 
pronounced by a man, shown in Fig. 16a corresponds to the 
processed signal shown in Fig. 14c. The corresponding time 
window when the vowel "i" as in "heat" is pronounced by a 

30 child is shown in Fig. 16b. From Figs. 16a and x6b it csua be 
observed that the leading and lagging edges of vbe most 
dominant pulses are sharp with a rise and f all ' v.ime in about 
0.4 ms or less and that the width of the dominant pxilses is 
about 0.8 ms when measured at the level of about 50 %. 
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The time window of the vowel "o" as in "hop", when pronoxmced 
by a man, shown in Fig. 17a corresponds to the processed 
signal shown in Fig. 15c, The corresponding time window when 
the vowel "o" as in "hop" is pronounced by a child is shown 
5 in Fig. 17b. From Figs. I7a and I7b it can be observed that 
the leading and lagging edges of the most dominant pulses are 
sharp with a rise and fall time in about 0.5 ms but the width 
of the dominant pulses is about 1.5 ms when mea vared at the 
level about 50 %. The ditch in the dominant pul&ses of Fig. 
10 17b is not deep enough to influence the perception. It should 
be noted that the vowel "o" as in "hop" is a sharp vowel, and 
a more soft vowel will have a more slow lagging edge. 

Pig. 18 shows the time window for the processed signal of the 
vowel "a" as in "have" when pronounced by a msui . It is to be 
15 observed that the shape of the transient pulse has softer 
leading and lagging edges them the pulses shown in Figs. 
16-17. 

Thus, from the above results it may be concluded that the 
perception of a vowel is given by the shape of the transient 
pulse. It is further to be concluded that by analysing the 
transient components or pulses which have been . extracted from 
the auditory signal by way of the above mentioned method of 
signal processing, the vowels or phonemes of the speech 
signal may be recognised by identifying the shape of the 
transient pulse or pulses. 

In a vowel or phoneme the trsuasient pulse is repeated and the 
repetition frequency gives the perception of the pitch. In 
Fig. 16a the time period between two succeeding^ pulses is 
about € ms corresponding to a man's pitch at 170 Hz and in 
30 Fig- 16b the time period between two succeedinc pulses is 
cdDout 3.5 ms corresponding to a child's pitch at 280 Hz 

Thus, it is also to be concluded that by analysing the 
transient component or pulses which have been extracted from 
the auditory signal by way of the above mentioned method of 
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signal processing, the pitch of the speech signal may be 
determined by determining the time period between the 
transient pulses. 

Thus, when analysing auditory signals according to a 
preferred embodiment of the present invention, it is taken 
into account that the identity of the souiid signal is 
presearved during the signal processing which ir zludes a 
highpass filtration followed by a rectification and a lowpass 
filtration of the inpiit signal. 

Prom the above discussion it should be understood that the 
present invention provides a method which is very suitable 
for use in speech recognition. 

Pig. 19 shows a block diagram for a speech recognition system 
according to the invention. In this system a pre-process unit 
20 is provided which comprises the bandpass filtrer 10, 
rectifying circuit 11 and lowpass filter 12 of Fig. 7. Thus, 
the pre-process unit, which most conveniently may be 
integrated within a single integrated circuit or chip, is a 
transient detecting unit in accordance with the method of the 
present invention. The system further comprises units which 
are normally used in speech recognition systems, such as a 
pattern recognition unit 21 connected to a reference library 
22, a unit for phoneme determination 23 and a, unit for 
word/sentence determination 24. The system shoi^n in Pig. 19 
uses template matching but alternative approaches may be used 
in a recognition system. 

The reference library 22 of Fig. 19 should store a library 
corresponding to the shapes which can be generated by the 
pre-process tinit 20 . 

It should be understood that a single chip pre-process unit 
also may comprise the lowpass filter 14 and or the A/D 
converter 15 as shown in Fig. 8. 
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It is to be understood that a pre-process according to the 
present invention could be used in many other electronic 
systems where speech or sound analysis, recognition, coding 
Euid/or decoding is required, such as quality measurement of 
5 audio products or systems, such as loudspeakers, hearing 
aids, and telecommunication systems, or for quality 
measurement of acoustic conditions. The pre-process may also 
be used in connection with speech compression and 
decompression in narrow band telecommunication . 

10 As illustrated in Fig. 10 the preferred cutoff frequency of 
the lowpass filter 12 used in a pre-process unit, should be 
below 1 kHz. Thus, all the necessary signal information of 
the auditory signals is represented within a ra.iher narrow 
frequency range of 1 kHz. This should be compax:=:rl to the 

15 frequency band of around 9000 bits per second which is used 
within the GSM mobile telecommunication system for the 
communication of speech signals. By using the pre-process 
method or unit of the present invention it should be possible 
to decrease the frequency band used for telecommunication 

20 down to about 1000 bits per second which would result in 
great savings within this area of communication. 

Thus, it should be understood that the present method is very 
well suited for optimizing the bandwidth within narrow band 
telecommunicaton amd it is within the scope of the invention 

25 that when transmitting an auditory signal in a 

telecommunication system, the signal should be., processed by 
using the pre-process described herein before^Bxng 
transmitted and received by a receiver^ It it' preferred that 
prior to transmission of the processed signal., vthe signal is 

30 coded into a digital representation, and the coded signal is 
decoded in the receiver so as to reestablish transient pulse 
shapes perceived by the animal ear such as the human ear as 
representing the distinct sound pictures of the auditory 



signal . 
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During the above mentioned digital traxLsmissiou the bandwidth 
may be chosen so as to fulfil different requirements to the 
quality of the received, decoded and reestablished transient 
pulse. Thus, a bandwidth of at the most 4000 bits per second 
5. may be selected, but it should be possible to obtain a good 
quality of the reestablished pulse by using a bandwidth 
around 2000 bits per second. However, it is preferred that 
the bandwith is in the interval of 800-2000 bit. 3 per second. 
It is to be noted that for telecommunicating sydJtems where a 
10 high system performance is preferred as opposed to a high 
quality of the reestablished signal, such as for example in 
military systems, a bandwidth aibout 400 bits per second may 
be selected. 

When trajismitting the digital signals it is pre/:«?rred that 
15 the digital information comprises information ajoout leading 
edge, lagging edge, and duration of the transient pulse 
representing the processed auditory signal. It la also 
preferred that a second and further pulses in a sequence of 
identical pulses are represented by a digital sign indicating 
20 repetition when transmitted. 

It is also sin object of the present invention to provide a 
method to be used in speech synthesis. 

Prom the discussion of the experimental results of Figs. 
14-18 it should be xinderstood that the sound of each vowel or 

25 phoneme might be given by the shape of a dominating transient 
pulse corresponding specifically to that phoneme. From 
experiments it has been concluded that transient pulses 
similar to the processed pulses of Figs. 16-18 rsold the 
necessary information in order to generate the >3oimd of the 

30 phoneme. 

By use of the software developed for the transient analysis 
illustrated in Figs. 14-18 it is possible to create a single 
transient signal by placing points in a system of coordinates 
where the ordinate is the amplitude and the abscissa is the 
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time in ms. One transient pulse may be created by placing one 
or several points and interpolate a line between the points 
either by a straight line or a sine curve and define a 
period. The signal is repeated for 300 ms and it is possible 
5 to listen to the signal when converted by a D/A converter in 
the codec chip. 

It should be noted that the pulse rise time or i*he form of 
the leading edge, the duration of the pulse, arc the fall 
time or the form of the lagging edge are all important 
10 features for identification, representation and' or generation 
of transient pulses for use in speech recognitxon and/or 
synthesis. These features may also be used in tr^onection with 
speech compression. 

This is illustrated in Figs. 20-25 which show hew transient 
15 pulses used for speech synthesis or identification should be 
formed for the phonemes "i" as in "heat", "o" as in "hop", 
"o" as in "ongaonga" or as in the Danish word ^Cle", "u" as 
in the word "who", "0" as in the Dajiish word "e.^se", and "y" 
as in the Danish word "lys", respectively. The pulses are 
20 repeated within a period of 5 ms. 

From Fig. 20 it can be seen that the phoneme "i*' as in "heat" 
could be formed by a very short pulse having a duration in 
the range of 0.3-1.1 ms, with a rise time of the leading edge 
being in the range of 0.3-0.5 ms. The fall time of the 
25 lagging edge should also be in the range of 0.3- 0.5 ms. 

Similarly it is observed from Fig. 21 that therphoneme "o" as 
in "hop" could be formed by a pulse having a duration in the 
range of 1.3-1.8 ms, with a rise time of the leading edge 
being in the range of 0.3-0.5 ms . The fall time of the 
30 lagging edge should be in the range of 0.3-0.5 tns. 

From Fig. 22 it is. observed that the phoneme "o- as in the 
Dsuiish word "Ole" could be formed by a pulse having a 
duration in the range of 1.3-1.8 ms in the upper part of the 
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pulse, with a rise time of the leading edge being in the 
range of 0.3-0.5 ms. The fall time of the lagging edge for 
this phoneme may vary, but should be in the range of 1.0-2.0 
ms . 

5 From Fig. 23 it is observed that the phoneme "u" as in the 
word "who" could be formed by generating a transient pulse 
with a sine curve interpolation and a duration in the range 
of 1,0-2.0 ms. The preferred duration should bfe about 1.5 ms. 

Fig. 24 show the pulse of the phoneme "0" as in the Danish 
10 word "0se". Here the leading edge may have a rase time in the 
range of 0.4-0.6 ms . The fall time of the lagging edge should 
be in the range 1.0-2.0 ms. 



Fig. 25 show the pulse of the phoneme 
word "lys". Here the leading edge may 
15 range of 1.0-2.0 ms. The fall time of 
also be in the range 1.0-2.0 ms. 

When synthesizing human speech in accordance with the above 
mentioned principles of the invention it is preferred to 
generate a series of transient pulses corresponding to the 
20 series of phonemes which constitutes the speech to be 

synthesized. It is furthermore preferred that tae series of 
phonemes is established from a series of letters using rule- 
based conversion. 

It should be \inderstood that the principles of xhe invention 
also can be used for quality measurement of^ aucLio products. 
In such a measurement a well defined transient ieignal should 
be transmitted to the audio product, and the distorsion of 
the response can be measured. The distorsion :xnay be measured 
by using a pre-process in accordance with the principles 
illustrated in Pig. 7. 

The principles of the invention may also be used in hearing 
aids in order to improve noise suppresion in speech signals. 



"y" as iu the Damish 
have a rise time in the 
the lagging edge should 



25 



30 
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A library of features representing characteristic shapes of 
the transient pulses may be used for identifying the speech 
signal and separate the speech signal from the noise 
background . 

The e^qperimentB presented have, for the first time^ shovm 
some common features for phonemes which are very simple to 
recognize and generate, but which could be of great 
significance within the whole area of recognitiou and 
generation of speech or auditory signals. 

The performance of the method and system of the present 
invention is described in the time domaine. It i.s however to 
be understood that the transient signals, components and/or 
pulses being described in the time domaine also could be 
given a corresponding description in the frequency domaine, 
which would naturally be within the scope of the invention. 

It is also to be noted that the methods of sign^ processing 
described above could be performed either digitally, 
electronically by use of analog components, mechanically, or 
by any combination thereof- Such methods of processing would 
also be within the scope of the invention. 
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1. The use of the shape of energy changes of an auditory 
signal for identifying or representing features which can be 
perceived by an animal ear such as a humaji ear as 

5 representing a distinct sound picture. 

2. The use according to claim 1, wherein the shape of the 
energy changes of the auditory signal is represented by the 
shape of a transient pulse of the signal. 

3. The use according to claim 2, wherein the shape of a 

10 transient pulse is obtained by use of an envel':;^e detection. 

4. The use according to any of the preceding rl vims, wherein 
the distinct sound picture is a phoneme. 

5. A method for identifying, in an auditory signaa, energy 
chsmges which caji be perceived by an animal ear such as a 

15 bumcui ear as representing a distinct sound picture, the 

method comprising comparing the shape of energy changes of 
the signal with predetermined energy change shapes 
representing distinct so\ind pictures. 

6. A method according to claim 5, wherein the shape of the 
20 energy changes are represented by the shape of a transient 

pulse of the signal. 

7. A method according to claim 6, wherein the i :. ape of a 
transient pulse is obtained by an envelope detf.iction of a 
transient response of the energy chsmge in the ^vuditory 

25 signal • 

8. A method for processing an auditory signal to reduce the 
bandwith of the signal with sxibstantial retention of the 
information of the signal, comprising extracting the 

30 transient conqponent of the auditory signal and detecting an 
envelope of the transient component. 
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9. A method according to claim 8, wherein transient pulse 
shapes of the signal which can be perceived by an animal ear 
such as a hximan ear as representing a distinct sound picture 
are identified. 

10. A method according to claim 9, wherein the distinct sound 
picture is a phoneme. 

11. A method according to claim 6 or 9, wherein the shape of 
the leading edge of a pulse is identified. 

12. A method according to claim 11, wherein the shape of the 
leading edge is determined by determining rise time, slope 
and/or slope variation of at least part of the leading edge. 

13. A method according to claim 12, wherein the rise time, 
slope and/or slope variation of at least the top part of the 
leading edge is determined. 

14. A method according to claim 13, wherein the top part is 
the part beginning substantially at a point where the slope 
is maximum. 

15. A method according to claim 12, wherein the rise time, 
slope and/or slope variation of the leading edge is 
determined on the basis of at least 5 sanples. 

16. A method according to any of claims 11-15, wherein the 
identification of the shape of the leading edge is perfomed 
using comparison with a library of references. 

17. A method according to claim 16, wherein the references 
with which comparison is made are selected on xhe basis of 
the rise time. of the leading edge. 

18. A method according to claim 6 or 9, wherein the duration 
of a pulse is identified. 
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19. A method according to claim i8 , wherein the duration of a 
pulse is determined as the distance from the leading edge to 
the lagging edge at a predetermined amplitude. 

20. A method according to claim 19, wherein the predetermined 
5 amplitude is an amplitude of at the most 50% of the maximum 

amplitude of the pulse. 

21. A method according to any of claims 11-20 waerein pulses 
which cannot be perceived by the suiimal ear are discarded 
from the identification. 



10 22. A method according to claim 21, wherein a pulse the 

leading edge of which has an amplitude of less zhan 50% of 
the amplitude of the amplitude of the preceding pulse and an 
onset time of less than 3.5 ms is disregarded. 

23. A method according to any of claims 11-22, wherein the 
15 shape of the lagging edge of a pulse is identified. 

24. A method according to claim 23, wherein the shape of the 
lagging edge is determined by determining fall time, slope 
and/ or slope variation of at least part of the leading edge. 

25. A method according to any of claims 11-23, wherein the 
20 time period between leading edges of pulses which can be 

perceived by the animal ear is determined. 

26. A method according to claim 25, wherein -the time period 
between leading edges which have a distance of at least 3 ms 
from each other is determined* 

25 27. A method for telecommunicating an auditory signal, 

con^rising processing the signal by the method according to 
any of claims 8-26, transmitting the processed ^rignal, and 
receiving the processed signal by a receiver. 
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28. A method according to claim 27, wherein, prior to 
transmission of the processed signal, the signal is coded 
into a digital representation, and the coded sianal is 
decoded in the receiver so as to reestcdDlish transient pulse 
5 shapes perceived by the animal ear such as the human ear as 
representing the distinct sound pictures of the auditory 
signal . 



29 . A method according to 
transmission is performed 

10 bits per second. 

30. A method according to 
at the most 2000 bits per 



claim 28, wherein the digital 

at a bandwidth of at the most 4000 

claim 29, wherein the oandwith is 
second . 



31. A method according to claim 30, wherein the bandwith is 
in the interval of 800-2000 bits per second. 

15 32. A method according to any of claims 28-31, wherein the 

4igital information comprises information about leading edge, 
lagging edge, and duration of the transient pulse. 

33. A method according to any of claims 28-32, vherein a the 
second and further pulses in a sequence of idenviical pulses 

20 are represented by a digital sign indicating repetition. 

34. A method according to any of the claims 8-2€, wherein the 
extraction of transient component comprises a bandpass 
filtration or a highpass filtration. 

35. A method according to any of the claims 8*2f or 34, 

25 wherein the envelope detection conprises a rectification and 
a lowpass filtration. 

36. A method according to claim 34, wherein the lower cutoff 
frequency of the bandpass or highpass filtration is at least 
2 kHz, such as about 3 kHz. 
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37. A method according to claim 34 or 36, wherein the upper 
cutoff frequency is in the range between 4.5 and 7 kHz, 
preferably about 6 kHz. 

38. A method according to claim 35, wherein the 
rectification is a one-way rectification. 

39. A method according to claims 35 or 38, whex sin the cutoff 
frequency of the lowpass filtration is in the xange of 400- 
1000 Hz, preferably about 700 Hz. 

40. A method according to any of the claims 8-2*; or 34, 
wherein the envelope detection comprises bandpass filtration 
by use of a bank of filters. 

41. A method of identifying or representing the phoneme "i" 
as in **heat", comprising identifying or generating a 
trsuisient pulse with a rise time of the leading edge of at 
the most 0.5 ms and a duration of at the most 1.1 ms. 

42. A method according to claim 41, wherein the rise time of 
the leading edge is at the most 0.4 ms, preferGQ:>ly at the 
most 0.3 ms. 

43. A method according to claim 41 or 42, wherein the 
duration is at the most 1*0 ms, preferably about 0.8 ms. 

44. A method of identifying or representing the phoneme •o" 
as in "hop", comprising identifying or generatijig a transient 
pulse with a rise time of the leading edge of tat the most 0.5 
ms cuid a duration of 1.3-1.8 ms. 

45. A method according to claim 44, wherein the rise time of 
the leading edge is at the most 0.4 ms, prefer&bly at the 
most 0.3 ms. 
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46. A method according co claim 41 or 42, wherein the fall 
time of the lagging edge is at the most 0.5 ms, preferably at 
the most 0.4 ras and more preferably at the most 0.3 ms. 

47. A method of identifying or representing the phoneme "o" 
as in the English word "ongaonga" or the Danish word "Ole", 
comprising identifying or generating a transient pulse with a 
rise time of the leading edge of at the most O.S ms and a 
duration of 1.3-1.8 ms. 

48. A method or identifying or representing the phoneme "u" 
as in the English word "who", comprising identifying or 
generating a transient pulse with a sine curve --interpolation 
and a duration of at 1.0-2.0 ms, preferably cJoout- 1.5 ms. 

49. A method according to any of the claims 1-26 or 41-48, 
when used in speech recognition. 

50. A method according to any of the daixns 1-7. or 41-48, 
used in speech coxr^ression. 

51. A method according to any of the claims 1-7 or 41-48, 
when used for synthesizing human speech, con5)rising 
generating a series of transient pulses corresponding to the 
series of phonemes which constitutes the speech co be 
synthesized. 

52. A method according to claim 51, wherein the series of 
phonemes is established from a series of letterr> using rule- 
based conversion. 

53. A method according to any of the claims 1-7 or 41-48, 
used in quality-measurement of audio products, the audio 
products preferably being loudspeadcers, hear dLng aids or 
telecommunication systems. 
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54. A method according to any of the claims 1-7 or 41-48, 
used in quality -measurement of acoustic conditions in a room 
or in the open. 

55. A system for processing an auditory signal no reduce the 
bandwith of the signal with substantial retention of the 
information of the signal, comprising means for extracting 
the transient component of the auditory signal, and 

means for detecting an envelope of the transient component. 

56. A system according to claim 55, further comj)rising means 
for identifying or representing the energy changes on the 
basis of the shape of the transient pulses. 

57. A system according to claims 55 or 56, wheiein the means 
for transient component extraction comprises a bandpass 
filter or a highpass filter. 

58. A system according to any of the claims 55-57, wherein 
the envelope detection means comprises a rectilier and a 
lowpass filter. 

59. A system according to claim 57 or 58, wherein the lower 
cutoff frecpiency of the bandpass or highpass f jJ-ter is at 
least 2 kHz, such as about 3 kHz. 

60. A system according to any of the claims 57-59, wherein 
the upper cutoff frequency of the bandpass filter is in the 
range between 4.5 and 7 kHz, preferably about ;f kHz. 

61. A system according to any of the claims 58- 60, wherein 
' the rectifier is a one-way rectifier. 

62. A system according to any of claims 58-61, wherein the 
cutoff frequency of the lowpass filter is in the range of 
400-1000 Hz, preferably about 700 Hz. 
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63 • A system according to claim 55 or 56, wherein tne 
envelope detection means conprises a filter bank. 
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Fig. 14c 
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Pre trans lent signal 




Fig. 15b 



Transient signal. 




Fig. 15c 
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TuneWindow. Nonnalized. 




Fig. 16b 
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Time Window. Nonnalizcd. 




Fig. 17a 



TlmeWindow, Nozmalized. 
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TixneWindow. Nonnalized. 




Fig. 18 
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Fig. 24 
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